library(fda)
library(fda.usc)
library(corrplot)
library(kableExtra)
library(leaflet)
data(aemet)
data.temp<- t(aemet$temp$data)
First, we have to smooth the data. Considering that in previous assignments we got that the best \(K\) was equal to 21, we will perform the smoothing based on this number of fourier basis for the data of the temperature average daily measures of the Spanish stations.
Fourier basis
fourier.basis <- create.fourier.basis(rangeval=c(0,365),nbasis=21)
Smooth temperature data
smooth.data.temp <- smooth.basis(argvals=1:365,y=data.temp,fdParobj=fourier.basis)
plot(smooth.data.temp,
lty=1,lwd=2,col="chartreuse3",
main="Smoothing temperature data with 21 Fourier basis functions",
xlab="Daily observations",
ylab="Average temperature (Celsius degrees)")
## [1] "done"
The first type of unsupervised methods that we will perform is the one based on basis expansions, which carry out unsupervised classification on the set of coefficients \(\widehat{c}_{i 1}, \ldots, \widehat{c}_{i k},\) for \(i=1, \ldots, n\), considering that distances between the functional observations will resemble the distances between the coefficients of the basis functions.
The clustering procedures that will be use are:
Additionally, we will be using the average silhouette for the partitional and hierarchical clustering, and for the Model-based the BIC to compare the performance of the methods, where the higher the value of the average silhouette width and the fewer the number of negative points, the better the method clustering performance. Conversely, in the case of the BIC, we will be looking for the biggest measure of the metric -BIC.
Next, we present the coefficients of the basis expansions. As we can see in the pairs plot, the data is grouped and some observations appear to be outliers. The first and second basis expansions are dividing the points into three groups: one denoted by just one observation, a big group in the middle, and a small group at the extreme. Conversely, the third one presents a more uniform distribution of the points, the other basis replicate these patterns.
Recall that the results showed a clear outlier that could be identified for the temperature data set which corresponds to the Navacerrada weather station. These results are expected, considering that Navacerrada is located in the mountains at 1200 mt of altitude and presents a different climate than the rest of the stations taking into account that the weather does not get to higher temperatures throughout the year. On the other hand, we saw a group of curves with really high temperatures all along the year.
X <- t(smooth.data.temp$fd$coefs)
kable(X, "html") %>%
kable_styling()%>%
scroll_box(width = "100%", height = "400px")
| const | sin1 | cos1 | sin2 | cos2 | sin3 | cos3 | sin4 | cos4 | sin5 | cos5 | sin6 | cos6 | sin7 | cos7 | sin8 | cos8 | sin9 | cos9 | sin10 | cos10 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A CORUÑA1980-2009 | 282.9709 | -32.68118 | -49.24660 | 8.595391 | -2.3037452 | 0.3396126 | -1.2829456 | -1.5742234 | 0.2493893 | 1.6434651 | 1.4763927 | 0.5393564 | 0.0381619 | -1.0506533 | -0.4844298 | -2.4205276 | -1.0417548 | -0.6451546 | 0.9155857 | 0.8589808 | 0.1679510 |
| A CORUÑA/ALVEDRO1980-2009 | 263.8590 | -33.11906 | -58.09502 | 9.242831 | -1.8206969 | 0.1105411 | -1.3695267 | -1.2670059 | 0.7788158 | 1.8448574 | 1.6781440 | 0.7265518 | 0.0037133 | -0.7646147 | -0.4569045 | -2.0857086 | -1.2108385 | -0.4207844 | 1.2151445 | 0.6486495 | -0.0827797 |
| SANTIAGO DE COMPOSTELA/LABACOLLA1980-2009 | 247.3361 | -35.20375 | -66.97325 | 12.487436 | -2.1071050 | -0.4324928 | -1.7249941 | -2.1047757 | 0.7180541 | 1.4001343 | 1.6420726 | 0.4874442 | -0.0986293 | -1.0590626 | 0.0448093 | -3.0653107 | -2.4339193 | -1.2827239 | 1.8492773 | 0.3721759 | 0.2723384 |
| VITORIA/FORONDA1980-2009 | 223.1514 | -40.91930 | -88.24389 | 12.863391 | -3.0496391 | 0.3968126 | -2.2957084 | -0.6456414 | -0.0714047 | 1.4506751 | 4.3696332 | 1.2177155 | -0.3885111 | -0.4258508 | -1.1716469 | -2.3236964 | -1.0885904 | 0.1777065 | 0.6035841 | 0.7626774 | 0.3613234 |
| ALBACETE/LOS LLANOS1980-2009 | 272.4308 | -49.04829 | -122.06585 | 21.687750 | 1.7514035 | -3.0278331 | -5.4396664 | -1.4216691 | 0.2295502 | 2.2035689 | 2.5871388 | 0.6720156 | 0.9444479 | -2.1301204 | -0.6040999 | -2.5611835 | -1.0894697 | -1.4370356 | 0.8645658 | -0.7850672 | 0.5221286 |
| ALICANTE1980-2009 | 348.9276 | -46.12441 | -84.42117 | 13.146310 | -2.0036211 | -0.5498516 | -2.3798870 | -0.7306191 | 0.4476576 | 2.3610941 | 1.3504552 | -0.5422736 | 0.2841932 | -1.5218617 | -0.6753411 | -1.8548751 | 0.4255017 | 0.0958278 | 0.1555817 | -0.2064544 | 0.0330772 |
| ALICANTE/EL ALTET1980-2009 | 348.2722 | -48.21014 | -84.29643 | 13.751352 | -2.6027049 | -0.3845500 | -2.1633002 | -0.8071643 | 0.4757679 | 2.2674861 | 0.9610034 | -0.4892393 | 0.2757264 | -1.3864844 | -0.6480226 | -1.7050889 | 0.3320227 | 0.1429877 | 0.0305349 | -0.2380840 | -0.2644156 |
| ALMERÍA/AEROPUERTO1980-2009 | 364.0427 | -45.49737 | -81.18186 | 13.097272 | -1.6606374 | -1.9124517 | -1.5818725 | -0.9524567 | 0.2698357 | 1.9084019 | 0.9694160 | -0.5515258 | 0.4816866 | -1.6358793 | 0.2177920 | -1.6041096 | -1.1320598 | -0.6844276 | 0.5257817 | 0.4121669 | -0.3275583 |
| ASTURIAS/AVILÉS1980-2009 | 257.9511 | -36.61986 | -52.59049 | 10.053055 | -1.4910252 | 0.7775722 | -1.4548434 | -0.1917113 | 0.4238810 | 1.7423385 | 2.9861942 | 0.7380779 | 0.2393803 | -0.9755369 | -0.8929982 | -2.1487748 | -0.8426929 | -0.4347785 | 0.9571586 | 0.9192926 | 0.2922768 |
| OVIEDO1980-2009 | 253.0988 | -36.30085 | -63.25778 | 11.296854 | -2.5288215 | 1.0391899 | -1.9154719 | -1.2433204 | 0.4251423 | 1.3784173 | 3.0899380 | 1.4024935 | -0.2693780 | -1.1462597 | -1.0814633 | -2.9921729 | -1.7762972 | -0.9940671 | 1.2824264 | 1.3856617 | 0.9276492 |
| BADAJOZ/TALAVERA LA REAL1980-2009 | 325.6690 | -43.90004 | -107.56623 | 17.239594 | -3.1363412 | -1.3324416 | -4.8173140 | -2.8947426 | 1.5080127 | 1.6062245 | 1.9279478 | 0.6963884 | 0.4680668 | -1.4989788 | 0.3960832 | -2.8464535 | -1.5120814 | -1.3649869 | 1.9282333 | -0.4811796 | -0.1284497 |
| IBIZA/ES CODOLA1980-2009 | 349.6330 | -52.88551 | -81.24233 | 12.870383 | -0.2725182 | -0.7795829 | -1.4958276 | 0.1029300 | 0.5285447 | 1.9070040 | 1.5262670 | -0.4553968 | 0.6253648 | -1.1825752 | -0.6649610 | -1.5918915 | 0.3089940 | -0.3237522 | 0.1072903 | -0.0880666 | 0.3704459 |
| MENORCA/MAÓ1980-2009 | 329.5930 | -55.01061 | -83.53072 | 11.845380 | 1.6526111 | -0.4278321 | -1.9602342 | 1.4449647 | -0.0122252 | 1.6656067 | 2.6368260 | -0.3394369 | 0.1454563 | -0.8231486 | -1.2971414 | -1.4832036 | -0.2200009 | -0.3596399 | -0.3977496 | -0.3435046 | 0.1670750 |
| PALMA DE MALLORCA, CMT1980-2009 | 347.9986 | -52.48874 | -82.41763 | 11.686637 | 1.1924935 | -0.3306164 | -1.9132485 | 1.0174294 | 0.1624839 | 1.7962913 | 1.9893369 | -0.5644276 | 0.3845370 | -1.0642415 | -1.1707003 | -1.5357421 | -0.1235907 | -0.5714598 | -0.2809617 | 0.0967112 | 0.0432689 |
| PALMA DE MALLORCA/SON SAN JUAN1980-2009 | 316.0561 | -55.14758 | -92.35168 | 12.101807 | 1.7861056 | 0.4044443 | -1.8704588 | 0.8717016 | -0.1540066 | 1.6315420 | 2.2511071 | -0.4674628 | 0.3445206 | -1.4086754 | -1.2279082 | -1.7142561 | -0.0515519 | -0.2451104 | -0.3671536 | -0.3065392 | -0.1182372 |
| BARCELONA (FABRA)1980-2009 | 294.9279 | -45.88051 | -94.95851 | 15.940956 | -0.5372185 | -0.7630372 | -3.2552023 | 1.2568043 | -0.0257801 | 0.9368625 | 3.3617240 | 0.1382251 | -0.6695807 | -0.7173391 | -1.9011148 | -3.4975102 | -0.9612442 | -1.5381713 | 0.4299776 | -0.2946184 | 0.3104747 |
| BARCELONA/AEROPUERTO1980-2009 | 307.3417 | -49.56833 | -89.01720 | 15.080522 | -2.6521035 | -0.1169634 | -2.4904273 | 0.7402777 | -0.2268558 | 1.6431689 | 1.6703286 | -0.0245143 | -0.5773005 | -0.9612822 | -1.1289312 | -1.9108302 | -0.3977534 | -0.4139661 | -0.4036458 | -0.2172389 | 0.2615180 |
| BURGOS/VILLAFRÍA1980-2009 | 204.8741 | -44.33953 | -100.91199 | 18.118797 | -0.5421837 | -2.4626916 | -3.3940569 | -1.0513769 | 0.1672130 | 1.5749597 | 3.8215852 | 0.9319479 | -0.4262907 | -1.6347889 | -1.3308094 | -3.4901852 | -1.8133426 | -0.5281978 | 0.5933607 | 0.0988031 | 0.2863348 |
| JEREZ DE LA FRONTERA/AEROPUERTO1980-2009 | 345.8833 | -45.89199 | -90.20972 | 14.998034 | -4.7748153 | -1.8425975 | -4.3387355 | -1.8803044 | 1.0591958 | 0.6574014 | 2.2112618 | 0.3627006 | 0.6297509 | -2.6514532 | 1.0779834 | -2.1079890 | -1.3316217 | -1.2025921 | 1.4647778 | 0.0485062 | -0.3635926 |
| TARIFA1980-2009 | 328.4858 | -36.03003 | -50.95503 | 7.769991 | -1.6460191 | -0.2790668 | -1.6672347 | -1.0859271 | 1.3254287 | 1.1069472 | 1.8358887 | 0.4396295 | 0.0856426 | -0.8422001 | 0.3779137 | -1.9363247 | 0.0314235 | -0.9904346 | 0.6391040 | -0.1659297 | -0.0393225 |
| SANTANDER/PARAYAS1980-2009 | 276.5759 | -38.35648 | -61.40470 | 8.661259 | -1.5347979 | 0.2852200 | -0.8163365 | -0.2538569 | 0.3556592 | 1.3477046 | 3.4739021 | 0.4092001 | -0.0823766 | -1.1024149 | -1.1198325 | -1.7666019 | -0.8382911 | 0.5746357 | 0.6646627 | 1.4098108 | 0.1308358 |
| CASTELLÓN1980-2009 | 334.3945 | -46.59058 | -89.94861 | 14.389232 | -2.0198918 | -0.6908869 | -2.5465360 | -0.0009230 | 0.4364707 | 2.0256772 | 1.4602464 | -0.4024054 | -0.4563241 | -1.4024511 | -1.2523018 | -1.8381999 | -0.1791288 | -0.5468850 | -0.1464410 | -0.5700115 | 0.2029905 |
| CIUDAD REAL1980-2009 | 297.4614 | -47.08403 | -126.97112 | 23.258096 | 1.1308664 | -3.4101622 | -6.2466474 | -2.4614911 | 0.6327544 | 1.8807467 | 2.3996822 | 0.4286394 | 1.1332721 | -2.0817999 | -0.6371823 | -3.2122212 | -1.3320213 | -1.7960128 | 1.3309969 | -1.0943402 | 0.2338777 |
| CÓRDOBA/AEROPUERTO1980-2009 | 348.1307 | -47.09854 | -114.29918 | 19.808521 | -3.1437448 | -2.2266194 | -4.7054566 | -2.1102150 | 0.0404472 | 1.1510661 | 3.6754522 | 0.5496891 | 0.6433824 | -2.4708421 | -0.0341190 | -3.3521371 | -0.9744931 | -1.7048407 | 1.7953006 | -0.1232181 | 0.1195404 |
| CUENCA1980-2009 | 250.2261 | -48.38848 | -116.17132 | 22.395020 | 4.1162231 | -3.4276089 | -5.0561744 | -1.5687520 | 0.1425971 | 2.6934008 | 2.5664007 | 0.8949024 | 1.1225062 | -1.9034042 | -1.1435199 | -3.0557302 | -1.8869315 | -1.6913611 | 1.1771124 | -0.7559855 | 0.4278213 |
| GIRONA/COSTA BRAVA1980-2009 | 280.5285 | -47.73545 | -100.98006 | 15.145432 | -1.0350566 | -0.2946875 | -4.1352209 | 1.5619169 | -0.0429222 | 1.4098085 | 3.3840749 | 0.8845244 | -0.6812364 | -0.9076658 | -1.6855095 | -2.1167485 | -0.9810921 | -0.8308340 | -0.1788391 | -0.6319935 | -0.1659579 |
| GRANADA/AEROPUERTO1980-2009 | 294.5806 | -44.71304 | -115.02854 | 18.887283 | -0.4321583 | -1.7589049 | -6.3216661 | -2.3989474 | 0.0816340 | 1.7250138 | 1.8953361 | -0.1413848 | 1.0638404 | -2.7688904 | 0.2846322 | -2.8160631 | -1.4278381 | -1.5122466 | 1.2906884 | -0.3173315 | 0.2950064 |
| GRANADA/BASE AÉREA1980-2009 | 298.8564 | -48.92945 | -115.85460 | 21.232115 | 1.2814281 | -2.8848752 | -6.2691122 | -2.2091075 | 0.6043530 | 1.9387219 | 2.3176617 | 0.2547009 | 0.8121474 | -2.5752535 | -0.0941849 | -2.5934791 | -1.3260102 | -1.4776899 | 1.5386042 | -0.3607214 | 0.5397760 |
| MOLINA DE ARAGÓN1980-2009 | 201.9101 | -43.11809 | -110.23869 | 18.820141 | 2.3843226 | -2.6344375 | -4.0827932 | -0.7453938 | -0.5387598 | 2.5542857 | 3.1290413 | 0.7710117 | 0.3554959 | -1.8919601 | -1.2860722 | -2.4674887 | -1.2415843 | -0.5687444 | 1.0317828 | -0.7811976 | 1.0643516 |
| SAN SEBASTIÁN,IGUELDO1980-2009 | 258.4528 | -40.04514 | -64.00734 | 8.886530 | -3.2411814 | 2.4943822 | -1.5070702 | 0.2150553 | -0.3813089 | 0.8554214 | 4.2780129 | 0.9151852 | -0.5050500 | -1.4522428 | -1.4493628 | -2.9080020 | -1.5373476 | -0.1597930 | 0.9264781 | 1.8186240 | 0.8549084 |
| SAN SEBASTIÁN/FUENTERRABIA1980-2009 | 283.2612 | -38.84987 | -75.88753 | 9.554445 | -3.8835579 | 1.2298224 | -1.2360762 | -0.0635833 | 0.0294616 | 0.8328380 | 4.1287316 | 0.5849700 | -0.8887101 | -1.2149563 | -1.7516396 | -2.0917871 | -1.0349698 | 0.1676777 | 0.7716215 | 1.4975332 | 0.8966897 |
| HUESCA/PIRINEOS1980-2009 | 268.7367 | -45.66780 | -117.18895 | 19.676455 | -2.6727733 | -1.4150859 | -6.9174359 | 0.0994020 | -0.9718866 | 2.2358890 | 2.5789887 | 0.7385824 | -1.3107901 | -0.5283942 | -2.2373120 | -3.2616257 | -2.1057345 | -0.7876771 | 0.8682700 | 0.0807724 | 0.3533677 |
| LOGROÑO/AGONCILLO1980-2009 | 266.2311 | -42.32996 | -105.87359 | 16.972611 | -1.8839624 | -1.0887304 | -3.8308906 | -1.3816318 | -0.1940449 | 1.5475247 | 3.0913959 | 0.8704754 | -0.9385136 | -1.0964059 | -1.7050126 | -3.0957005 | -1.1645640 | -0.5689882 | 0.8225123 | 0.3100136 | 0.2437765 |
| FUERTEVENTURA/AEROPUERTO1980-2009 | 401.1117 | -32.67589 | -33.83380 | 3.513117 | -5.3925490 | -1.2706739 | -2.3674244 | -0.8858529 | 0.7149449 | 0.1246139 | 0.9134413 | 0.2046021 | -0.2211611 | -0.8517626 | 1.0277058 | -0.8987987 | -0.2447961 | -0.5698163 | 0.7282367 | -0.4137462 | -0.2043981 |
| LANZAROTE/AEROPUERTO1980-2009 | 402.5901 | -33.81782 | -37.49656 | 5.514036 | -7.1326198 | -1.6008734 | -1.9979149 | -1.1345652 | -0.1794541 | 0.7445969 | 1.3945002 | 0.3083375 | 0.1361207 | -0.9727994 | 0.6242561 | -0.7162369 | -0.5110697 | -0.7016410 | 0.9940900 | -0.7159961 | 0.4440462 |
| LAS PALMAS DE GRAN CANARIA/GANDO1980-2009 | 402.6180 | -33.19917 | -30.14324 | 3.830208 | -5.5223056 | -1.2753214 | -2.7433501 | -0.9324105 | 0.2560589 | 0.4978318 | 0.8592252 | 0.1447162 | -0.1050152 | -0.7928075 | 0.2521311 | -0.6944383 | -0.3413203 | -0.5375237 | 0.6286039 | -0.6541205 | -0.1143705 |
| LEÓN/VIRGEN DEL CAMINO1980-2009 | 212.5049 | -42.38743 | -101.39484 | 17.638649 | -1.0482601 | -2.1090527 | -5.0687490 | -2.5782870 | -0.0488386 | 1.5318945 | 2.1379832 | 0.5593061 | -0.1429423 | -1.7413622 | -0.7127233 | -4.0906030 | -2.2507616 | -1.5855628 | 1.3683732 | -0.2637101 | 0.5546630 |
| PONFERRADA1980-2009 | 248.7203 | -36.31874 | -104.98149 | 17.460965 | -3.7291022 | -1.6864323 | -5.8951025 | -1.8680505 | -0.0948651 | 2.0793128 | 2.0149421 | 0.7502435 | 0.0441252 | -1.0423404 | -0.3164346 | -2.8028768 | -2.5108506 | -0.8982679 | 1.3137203 | 0.6688332 | 0.2645290 |
| COLMENAR VIEJO/FAMET1980-2009 | 255.0058 | -47.49807 | -116.12857 | 24.557230 | 4.4338149 | -4.4832603 | -5.0962643 | -2.3723961 | -1.0975753 | 1.4884972 | 3.8041818 | 1.6663396 | 0.8774147 | -1.6319455 | -2.0914784 | -4.7200766 | -1.3765944 | -1.8746606 | 1.7202522 | -0.1408081 | 0.7817887 |
| MADRID,RETIRO1980-2009 | 287.6036 | -44.64229 | -119.05447 | 23.540129 | 1.3089435 | -3.8783261 | -5.3568416 | -2.2664051 | 0.7439392 | 2.4593373 | 1.7317097 | 1.3610133 | 0.4821807 | -1.4826086 | -1.1527550 | -3.5493721 | -1.9941530 | -1.7481407 | 1.5189293 | -0.7039646 | 0.7214449 |
| MADRID/BARAJAS1980-2009 | 276.8936 | -48.00527 | -121.37220 | 22.464441 | 1.3054254 | -3.2081313 | -5.7552329 | -1.7372602 | 0.7555585 | 2.4421005 | 1.8810752 | 0.7756651 | 0.6030327 | -1.5406764 | -0.8850883 | -2.9132109 | -1.5899130 | -1.3250289 | 1.3415555 | -1.0477705 | 0.4103310 |
| MADRID/CUATRO VIENTOS1980-2009 | 285.4471 | -47.50713 | -120.49242 | 23.437987 | 1.8754081 | -3.2764320 | -5.7137470 | -2.4349004 | 0.7236317 | 2.5085194 | 1.8308773 | 1.0859571 | 0.5598366 | -1.4800548 | -1.0452474 | -3.4611308 | -1.8570788 | -1.7755851 | 1.5164082 | -0.7352007 | 0.7029367 |
| MADRID/GETAFE1980-2009 | 286.9563 | -47.67304 | -123.57865 | 23.249717 | 1.5916781 | -3.1514644 | -5.9607843 | -2.1287357 | 0.7288148 | 2.5489124 | 2.0452804 | 0.8590877 | 0.6246180 | -1.6289714 | -0.9686252 | -3.1566203 | -1.8552382 | -1.7662536 | 1.5633323 | -0.9455556 | 0.4046981 |
| MADRID/TORREJÓN1980-2009 | 279.6705 | -47.23282 | -120.94190 | 22.348516 | 0.9826405 | -3.0944031 | -5.5358665 | -1.8744878 | 0.5579286 | 2.4514463 | 2.0093644 | 0.7511068 | 0.6954879 | -1.5956445 | -1.0429720 | -3.0840252 | -1.7496061 | -1.4508520 | 1.2619539 | -0.9534738 | 0.3865823 |
| NAVACERRADA,PUERTO1980-2009 | 132.9853 | -53.95161 | -103.43337 | 24.846670 | 9.8394762 | -4.1483201 | -3.5953767 | -2.6076611 | 0.1126252 | 2.9552357 | 3.0976508 | 1.6879127 | 1.3688310 | -2.6085883 | -0.5916216 | -4.3649179 | -2.5106630 | -2.3333989 | 2.5040951 | -0.0789479 | 1.5360515 |
| MÁLAGA/AEROPUERTO1980-2009 | 352.9565 | -43.17149 | -81.41849 | 13.443651 | -0.3629347 | -1.0272449 | -1.9956965 | -2.1763667 | 0.1135395 | 1.4315177 | 1.2329542 | -0.1292561 | 0.9159449 | -1.5539209 | 0.3564688 | -1.6141302 | 0.6210521 | -0.1232930 | 0.0700585 | -0.4975258 | 0.1967981 |
| MELILLA1980-2009 | 361.8327 | -45.48178 | -70.02966 | 12.269730 | -1.2793021 | -2.3814288 | -1.4780620 | -0.3452641 | -0.0403101 | 0.8848586 | 1.2189006 | 0.0679312 | 0.4937988 | -1.5785974 | 0.5185419 | -1.9420450 | 0.3698374 | -0.6121313 | 0.3321095 | -0.1174500 | -0.1068946 |
| MURCIA/ALCANTARILLA1980-2009 | 348.1318 | -47.24291 | -102.37648 | 17.140922 | -3.1023404 | -1.0130222 | -3.6107930 | -0.9406348 | -0.2219212 | 1.8824235 | 1.9824157 | -0.6117331 | 0.1676451 | -1.8746688 | -0.4012148 | -2.0985941 | 0.5685491 | -0.5621050 | 0.2747132 | -0.6830094 | 0.0358463 |
| MURCIA/SAN JAVIER1980-2009 | 336.7576 | -48.75607 | -84.73836 | 13.289239 | -3.7843897 | -0.2563130 | -2.3370099 | -1.2238191 | -0.0280220 | 1.8899509 | 1.1609044 | -0.6907776 | 0.4419751 | -1.7003267 | -0.2102765 | -1.5519291 | 0.3330464 | -0.0693439 | 0.0535695 | -0.4910863 | -0.3938778 |
| PAMPLONA/NOAIN1980-2009 | 245.9506 | -44.14326 | -99.79554 | 14.922033 | -2.9652105 | -0.6071312 | -3.3918470 | -0.7394620 | -0.6699973 | 1.1928567 | 4.1079005 | 0.6532825 | -1.0312738 | -0.9425560 | -1.6833711 | -2.9570379 | -1.5844522 | -0.5675188 | 0.3509109 | 0.2347016 | 0.0942964 |
| OURENSE1980-2009 | 284.6056 | -37.37696 | -90.47892 | 16.313192 | -3.2171014 | -0.8098233 | -3.8798653 | -1.3711647 | 1.0120323 | 2.0172391 | 2.0607675 | 0.6299683 | -0.1781738 | -0.4975259 | -0.4662668 | -2.5538253 | -1.7586210 | -0.5248777 | 1.6813259 | 0.3354333 | -0.0378846 |
| VIGO/PEINADOR1980-2009 | 267.3699 | -33.35902 | -66.58544 | 11.618272 | -3.7760041 | -0.0044914 | -3.4029775 | -1.7234516 | 1.0439281 | 1.7360345 | 1.1136600 | 0.2927954 | -0.2258620 | -0.8458975 | 0.1523512 | -3.1837677 | -2.1428817 | -1.3116561 | 1.9786866 | 0.0909782 | -0.0353470 |
| SALAMANCA,OBS.1980-2009 | 239.9384 | -43.62864 | -109.02429 | 20.467697 | 0.9700250 | -1.6658304 | -5.4229523 | -3.1802306 | 0.4028463 | 2.2430545 | 3.9857006 | -0.0223607 | 0.4229357 | -1.1401277 | -0.5602922 | -3.1932187 | -1.5769664 | -0.4173911 | 2.1850277 | -0.3554443 | 0.0010631 |
| SALAMANCA/MATACAN1980-2009 | 232.1253 | -42.38085 | -108.52682 | 17.752170 | 0.7158430 | -1.2632802 | -5.1703737 | -2.2443725 | 0.4645872 | 1.6127496 | 2.8392797 | 0.1498864 | 0.8162850 | -1.7053014 | -0.0714528 | -3.1609733 | -1.8141266 | -0.7739819 | 1.2712686 | -0.1845732 | 0.4766317 |
| HIERRO/AEROPUERTO1980-2009 | 401.4540 | -31.50532 | -19.51186 | 1.364761 | -5.1092195 | 0.5707162 | -0.9238634 | -1.2070518 | 1.0441429 | 0.3471551 | 0.5442416 | -0.4520099 | -0.1076944 | -0.5222332 | 0.0505967 | -0.9016524 | 0.3890734 | 0.2350044 | 0.5086675 | -0.2568543 | 0.4373297 |
| IZAÑA1980-2009 | 194.8986 | -45.32542 | -80.51824 | 18.132518 | 7.5603657 | -9.4243865 | -3.1108006 | 0.0202531 | -0.3380994 | 0.8909050 | 1.6408798 | 0.6905826 | 0.6277933 | -3.4583441 | 0.3041682 | 1.5096234 | 0.0193015 | -0.2177697 | 1.4408512 | -0.3023127 | -1.3766390 |
| LA PALMA/AEROPUERTO1980-2009 | 395.9528 | -33.05775 | -25.82311 | 3.359385 | -4.2017227 | -0.1209668 | -1.4844171 | -1.0075417 | 0.7558737 | 0.0619565 | 0.6734901 | -0.0087788 | 0.0773405 | -0.5424580 | 0.1733546 | -0.5282945 | -0.1728016 | -0.0319106 | 0.2810443 | -0.4529555 | -0.0309601 |
| STA.CRUZ DE TENERIFE1980-2009 | 410.5936 | -35.03477 | -35.54473 | 5.528362 | -3.0264724 | -1.7301168 | -1.8770781 | -0.2486101 | 0.1557218 | -0.1560639 | 1.2278767 | 0.7030744 | -0.5396445 | -0.7717236 | 0.5773617 | -0.3069350 | -0.7653267 | -0.4704271 | 0.8281445 | -0.4490524 | 0.2880104 |
| TENERIFE/LOS RODEOS1980-2009 | 320.3172 | -37.92652 | -37.48231 | 8.032729 | -5.9210921 | -1.8568158 | -2.4430415 | -1.2774351 | -0.6499157 | 0.8983609 | 2.4142353 | 0.6537382 | 0.0011200 | -1.3043057 | 0.6746014 | -0.6023816 | -1.5752180 | -0.2617020 | 1.4093498 | -1.5468789 | 0.2892457 |
| TENERIFE/SUR1980-2009 | 407.7493 | -33.33636 | -26.89772 | 5.073226 | -4.8726773 | -2.0199029 | -2.3753123 | -0.9378790 | -0.0308208 | 0.3977674 | 1.4343800 | 0.3965625 | 0.2538654 | -0.6777184 | 0.1689851 | -0.1254949 | -1.0148377 | -0.5673287 | 0.8874468 | -0.8355502 | 0.5637646 |
| MORÓN DE LA FRONTERA1980-2009 | 341.7373 | -49.77432 | -100.12918 | 18.098574 | -2.9694603 | -2.6355718 | -4.1333821 | -1.6436727 | 0.7984065 | 1.0256182 | 2.4433413 | 0.2112953 | 0.4341821 | -2.6612901 | 0.8630047 | -2.9671738 | -1.7611915 | -1.4645421 | 1.5010054 | -0.1305004 | -0.4540712 |
| SEVILLA/SAN PABLO1980-2009 | 366.3539 | -45.01475 | -104.70683 | 16.543993 | -3.9700683 | -2.1214666 | -5.3529308 | -2.1857566 | 1.1281833 | 1.1993482 | 2.3792655 | 0.9189294 | 0.3137983 | -2.2411730 | 1.0351938 | -2.6754720 | -1.5406434 | -1.3476787 | 1.5131091 | 0.0250691 | -0.0524160 |
| SORIA1980-2009 | 211.6560 | -45.66816 | -106.05302 | 20.066786 | 2.1957522 | -3.1580646 | -4.3112252 | -0.9876015 | -0.0433532 | 2.0774066 | 3.2896320 | 1.1834980 | -0.1546677 | -1.7970440 | -0.9269142 | -3.4316607 | -1.4723920 | -1.3810208 | 1.3057324 | -0.0986145 | 0.6186318 |
| REUS/AEROPUERTO1980-2009 | 307.8827 | -47.31925 | -93.50775 | 15.929967 | -1.5565675 | -0.6988968 | -2.9272994 | 0.6648090 | -0.0450142 | 2.0924088 | 2.0612062 | -0.0563098 | -0.6263165 | -1.1663010 | -1.5024391 | -2.0762358 | -0.6393159 | -0.3355936 | -0.4905200 | -0.3151482 | 0.2944451 |
| TORTOSA1980-2009 | 339.0842 | -44.67341 | -100.16069 | 16.744688 | -2.7613014 | -1.0756524 | -3.7857433 | -0.2198231 | -0.1787674 | 2.1854855 | 1.7230283 | 0.0283966 | -0.6621191 | -0.7434074 | -1.7371805 | -2.0255477 | -0.6575000 | -0.8451663 | 0.0528380 | -0.3573695 | 0.4458831 |
| VALENCIA1980-2009 | 349.6455 | -45.28048 | -84.15514 | 14.813648 | -2.2299117 | -0.6799715 | -2.5222424 | -0.3123141 | 0.5471122 | 2.2366860 | 1.6338033 | -0.2572361 | -0.1879941 | -1.5459020 | -1.1934859 | -1.8570736 | 0.2298603 | -0.4381806 | 0.1604740 | -0.4707109 | 0.2765789 |
| VALENCIA/AEROPUERTO1980-2009 | 336.9614 | -46.01320 | -92.46138 | 14.994866 | -2.9074100 | -0.3369762 | -2.8966836 | -0.1758707 | 0.2045736 | 2.3313945 | 1.7014032 | -0.1156503 | -0.1666646 | -1.7630489 | -1.1667770 | -1.8411042 | 0.2526861 | -0.5395653 | 0.0993123 | -0.5766047 | 0.3109028 |
| VALLADOLID1980-2009 | 243.0724 | -43.84458 | -111.93685 | 20.366113 | -1.4592595 | -2.2057913 | -5.1824314 | -1.8842919 | 0.3282828 | 1.7452060 | 2.6552905 | 0.6665892 | 0.3054141 | -1.4054358 | -0.9741368 | -3.3722031 | -1.9322844 | -0.4924167 | 1.1518315 | -0.0938691 | 0.4262927 |
| VALLADOLID (VILLANUBLA)1980-2009 | 219.1508 | -45.25789 | -105.75358 | 20.318160 | -1.8254685 | -1.7038280 | -5.2232189 | -1.9052711 | 0.0160195 | 1.7155395 | 2.6391103 | 0.7172604 | 0.1356750 | -1.7489646 | -0.5003928 | -4.0889310 | -1.9760445 | -0.9757825 | 1.0978410 | -0.0884432 | 0.5295816 |
| BILBAO/AEROPUERTO1980-2009 | 280.4376 | -39.45003 | -68.59287 | 9.443115 | -3.2612505 | 1.9862155 | -1.8553551 | 0.0308771 | -0.4097025 | 1.0206858 | 3.9902742 | 0.9984448 | -0.5388331 | -1.2305048 | -0.9968320 | -2.7502039 | -1.2565543 | -0.1349834 | 1.0585559 | 1.3489162 | 0.7036800 |
| ZAMORA1980-2009 | 251.2256 | -42.83454 | -111.23115 | 19.089749 | -1.5321409 | -2.1082886 | -5.5185444 | -2.2139105 | 0.3562643 | 1.3289032 | 2.5426947 | 0.5035071 | 0.4406667 | -1.3132291 | -0.5720413 | -3.3466812 | -2.0454288 | -0.8341519 | 1.0138154 | 0.0672543 | 0.0672003 |
| DAROCA1980-2009 | 249.0036 | -46.82672 | -114.47812 | 19.377694 | 1.0935429 | -2.2789533 | -4.5129364 | -0.6450739 | -0.5617587 | 1.7883376 | 4.0246911 | 0.7098044 | 0.0335291 | -1.7050263 | -1.3330633 | -2.8229025 | -1.2673760 | -0.6583877 | 0.9172278 | -0.1699197 | 0.6241470 |
| ZARAGOZA (AEROPUERTO)1980-2009 | 296.0386 | -42.98689 | -118.35949 | 17.489648 | -2.6339248 | -0.7443013 | -5.0297353 | -0.9083070 | -0.4943320 | 1.7822145 | 2.7465215 | 0.7431359 | -0.9005880 | -0.6851087 | -1.9218739 | -2.5422655 | -0.9392230 | -0.4225503 | 0.6848014 | -0.1432006 | 0.2440261 |
pairs(X,pch=19,col="chartreuse3",main="Coefficients of the basis expansions")
This method will map the data into the clusters based on the nearest mean of the distance of each observation to each group. For identifying the optimal group partition we will use the Elbow Curve method (WSS), which will denote a bend at the \(k\) that is the best. In this case, we can see that \(k=5\) will be useful as a partition point. Consequently, we try \(G=5\), 1000 iterations, and 100 initial solutions.
library("factoextra")
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_nbclust(X,kmeans,method="silhouette",k.max=10)
kmeans.X <- kmeans(X,centers=5,iter.max=1000, nstart=100)
The following plots present the resulting solution after a comparison of the 100 final solutions. As we can see in the pairs plot, we can easily identify that there are groups of observations with the same color located at the extremes and that there are just a few basis that show mixes in the data points, reflecting a clear grouping trend. Moreover, we can confirm that there is a group of observations which behavior differs from the rest. In this case, are colored in purple, and, based on previous assignments, we know that these curves correspond to zones where the temperatures are higher all along the year.
In the second plot, we can see in a clear way the clustering performance. The curves have been divided in an expected pattern, considering that the purple ones are those which average daily temperature remains the same all over the year, and, conversely, we see in blue zones which have greater temperature variances along the seasons but who will not reach as high temperatures as the orange or green ones. Additionally, the pink set is formed by those who have a more stable trend that the past group, presenting less rough variations throughout the seasons, but lower temperatures than the purple ones.
Furthermore, the third plot presents the average silhoutte measure which is equal to 0.55, considering that the larger value is given for the fifth cluster that contains the purple curves with extreme behavior.
colors.kmeans.X <- c("green","orange","pink","blue","purple")[kmeans.X$cluster]
pairs(X,pch=19,col=colors.kmeans.X,main="kmeans solution with the coefficients of the basis expansions")
plot(smooth.data.temp,lty=1,lwd=2,col=colors.kmeans.X,main="kmeans solution with the coefficients of the basis expansions",xlab="Day",ylab="Temperature")
## [1] "done"
library("cluster")
sil.kmeans.X <- silhouette(kmeans.X$cluster,dist(X,"euclidean"))
plot(sil.kmeans.X,col=c("green","orange","pink","blue","purple"))
In the following plot, we present a map representation of the final clusterization with this method.
# MAPS basic
coordinates <- matrix(c(aemet$df$longitude,aemet$df$latitude), nrow=73, ncol=2)
rownames(coordinates) <- aemet$df$name
colnames(coordinates) <- c("Longitude", "Latitude")
#MAP GGPLOT ICONS
icons <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = colors.kmeans.X
)
map <- leaflet() %>%
addTiles() %>%
addAwesomeMarkers(lng = coordinates[,1],
lat = coordinates[,2],
popup = aemet$df$name,
icon=icons)
map
As the results show, this partition seems to be really accurate considering that it is, in fact, dividing the data into zones and partition the curves of the stations based on their geographical location, which can be understandable considering that the temperatures are highly related to the zone and being affected by its latitude, terrain, and altitude, as well as nearby water bodies and their currents. As seen, there is a distinction among the stations located in the north which are the pink curves, and as said, present some lower and more stable temperatures; the green ones in the center with higher temperatures and changes over the seasons; the blue ones located in the central mountain area with the lowest temperatures and changes throughout the year; the orange ones located near the coast and presenting higher temperatures. Lastly, we see there are just two stations on the Canary Islands that have been assigned to the blue and orange groups, we see that these are Izana and Tenerife/Los Rodeos, which in fact, unlike the other stations they are located in the middle of the territory and are not so close to the sea, consequently, it is understandable that they are considered as curves that have a behavior similar to the other groups rather than to the purple group which contains locations with more extreme temperatures.
The PAM is is a clustering algorithm similat to k-means which stands for “partition around medoids” and will attempt to minimize the distance between a point and a point designated as the center of that cluster, but conversely to k-means, these center points or medoids can be used with arbitrary distances. As before, we calculate the \(k\) with the average silhouette. The plot suggests the presence of 5 clusters, so we try \(G=5\) uusing the manhattan metric for the calculation of the distances.
fviz_nbclust(X,cluster::pam,method="silhouette",k.max=10)
pam.X <- pam(X,k=5,metric="manhattan",stand=FALSE)
Next, we present a pair plot of the cluster group basis expansion coefficients, a plot with the functional smoothed curves of the temperature daily average measures for the Spanish Stations and the silhoutte resulting plot.
If we compare the results to the ones obtained by the k-means method, we see that the average silhouette is smaller, which denotes that the first method presented a better clusterization. Moreover, we see some resembles, we have adjusted the colors to the group past behaviour, and we can see that now we have almost the same division for the curves mainting the zone clusterization based on a geographical consideration.
colors.pam.X <- c("pink","blue","green","orange","purple")[pam.X$cluster]
pairs(X,pch=19,col=colors.pam.X,
main="PAM solution with the coefficients of the basis expansions")
plot(smooth.data.temp,
lty=1,lwd=2,
col=colors.pam.X,main="PAM solution with the coefficients of the basis expansions",xlab="Day",ylab="Temperature")
## [1] "done"
sil.pam.X <- silhouette(pam.X$clustering,dist(X,"manhattan"))
plot(sil.pam.X,col=c("pink","blue","green","orange","purple"))
By taking a look at the map representation of the clusterization, we can identify that we have a mixture between the pink and orange group, considering that Tarifa it is not presented as part of the west cost group and Tenerife/Los Rodeos is also part of the pink group.Furthermore, we can say that it is a good partition considering that we are taking into account the manhattan metric for calculating the dissimilarities between object and their closest selected object, which differs from the past method, being a good sign that it resembles to the past solution.
#MAP GGPLOT ICONS
icons <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = colors.pam.X
)
map <- leaflet() %>%
addTiles() %>%
addAwesomeMarkers(lng = coordinates[,1],
lat = coordinates[,2],
popup = aemet$df$name,
icon=icons)
map
The third method that we will be using is the agglomerative hierarchical clustering which starts by treating each object as a single cluster and next it will merge clusters successively until all clusters have been merged into one big cluster containing all objects. In this case, it will calculate the average distance between clusters before merging them, and based on this measure the linkage will occur.
First, we calculate the distances between the coefficients with the Manhattan metric that will take into account the potential outliers and present a dendrogram that suggests the presence of two big clusters where the second one is formed by two ramifications, one with a small group of two observations and another one with two major groups. In this case, the \(k\) presents a different optimal partition in contrast to the other methods, but recall that for both the \(k=2\) was the second-best option. We will try \(G=2\) to see if the big division presents a coherent clusterization of the data.
dist.X <- daisy(X,metric="manhattan",stand=FALSE)
average.X <- hclust(dist.X,method="average")
average.X.plot <- as.dendrogram(average.X)
par(cex=0.3, mar=c(5, 8, 3, 0))
plot(average.X.plot, xlab="", ylab="", main="", sub="", axes=FALSE)
rect.hclust(average.X,k=2,border="green")
par(cex=1)
title(main="Average linkage")
axis(2)
As the results show, this partition is separating the curves into one group that it is formed by the curves with extreme high temperatures and the other one with the rest of the curves. Now, we see that the avergare silhouette is 0.5 which is lower than the one given by the k-means clusterization, consequently, this remains to be the best one obtained.
colors.average.X <- c("green","orange")[cutree(average.X,2)]
pairs(X,pch=19,col=colors.average.X,main="Hierarchical clustering solution with the coefficients of the basis expansions")
plot(smooth.data.temp,lty=1,lwd=2,col=colors.average.X,main="Hierarchical clustering solution with the coefficients of the basis expansions",xlab="Day",ylab="Temperature")
## [1] "done"
sil.average.X <- silhouette(cutree(average.X,2),dist(X,"euclidean"))
plot(sil.average.X,col=c("green","orange"))
If we locate in the map the cluster curves, we can see that the orange group contains the same 7 curves that were purple in the last two methods and which correspond to records from the Stations located in the Canary Islands that are closed to the sea, and the second group is formed by the rest of the curves in Spain and the two stations Izana and Tenerife/Los Rodeos, which actually are not located as close to the sea as the others. This is a simple but effective partition that gives a general idea of the behavior of the Stations separating the ones with extreme temperatures throughout the year.
#MAP GGPLOT ICONS
icons <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = colors.average.X
)
map <- leaflet() %>%
addTiles() %>%
addAwesomeMarkers(lng = coordinates[,1],
lat = coordinates[,2],
popup = aemet$df$name,
icon=icons)
map
The model-based clustering method or MClust, which will model the data as a Gaussian finite mixture with different covariance structures and different numbers of mixture components, so it considers the data is coming from a distribution that is mixture of two or more clusters.
We compute the value of the BIC for the all possible models with maximum number of components equal to 10, and as seen the optimal number of \(G\) is denoted to be \(7\) with the highest -BIC. Then, the Mclust will be selecting a model with 7 clusters in which the covariance matrices are ellipsoidal and with the same eigenvectors.
library(mclust)
## Package 'mclust' version 5.4.6
## Type 'citation("mclust")' for citing this R package in publications.
##
## Attaching package: 'mclust'
## The following object is masked from 'package:mgcv':
##
## mvn
BIC.X <- mclustBIC(X,G=1:10)
BIC.X
## Bayesian Information Criterion (BIC):
## EII VII EEI VEI EVI VVI EEE
## 1 -12583.443 -12583.443 -5648.159 -5648.159 -5648.159 -5648.159 -4557.488
## 2 -11409.743 -12200.271 -5277.208 -5431.105 -5284.327 -5460.857 -4602.526
## 3 -11068.244 -11334.799 -5141.822 -5179.992 -5142.603 -5201.065 -4579.655
## 4 -10419.067 -10127.496 -4903.027 -4871.802 -5061.562 -4987.023 -4506.117
## 5 -9904.256 -9849.151 -4821.565 -4671.297 -4996.716 -4809.421 -4480.593
## 6 -9558.750 NA -4687.532 NA NA NA -4404.418
## 7 -9506.018 NA -4611.760 NA NA NA -4398.732
## 8 -9375.226 NA -4575.698 NA NA NA -4400.277
## 9 -9367.565 NA -4507.056 NA NA NA -4443.929
## 10 -9432.011 NA -4528.243 NA NA NA -4449.679
## EVE VEE VVE EEV VEV EVV VVV
## 1 -4557.488 -4557.488 -4557.488 -4557.488 -4557.488 -4557.488 -4557.488
## 2 NA NA NA -5256.385 -4946.287 NA NA
## 3 NA NA NA -5873.695 -5103.235 NA NA
## 4 NA NA NA -6337.032 -5257.149 NA NA
## 5 NA NA NA -6386.550 -5192.290 NA NA
## 6 NA NA NA -7165.447 NA NA NA
## 7 NA NA NA -7184.509 NA NA NA
## 8 NA NA NA -8138.750 NA NA NA
## 9 NA NA NA NA NA NA NA
## 10 NA NA NA NA NA NA NA
##
## Top 3 models based on the BIC criterion:
## EEE,7 EEE,8 EEE,6
## -4398.732 -4400.277 -4404.418
plot(BIC.X)
Mclust.X <- Mclust(X,x=BIC.X)
summary(Mclust.X)
## ----------------------------------------------------
## Gaussian finite mixture model fitted by EM algorithm
## ----------------------------------------------------
##
## Mclust EEE (ellipsoidal, equal volume, shape and orientation) model with 7
## components:
##
## log-likelihood n df BIC ICL
## -1375.598 73 384 -4398.732 -4398.749
##
## Clustering table:
## 1 2 3 4 5 6 7
## 13 21 17 5 8 8 1
?mclustModelNames
## starting httpd help server ...
## done
colors.Mclust.X <- c("pink","blue","green","orange","purple","red","gray")[Mclust.X$classification]
pairs(X,pch=19,col=colors.Mclust.X,
main="MClust solution with the coefficients of the basis expansions")
plot(smooth.data.temp,
lty=1,lwd=2,col=colors.Mclust.X,
main="MClust solution with the coefficients of the basis expansions",
xlab="Day",ylab="Temperature")
## [1] "done"
#MAP GGPLOT ICONS
icons <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = colors.Mclust.X
)
map <- leaflet() %>%
addTiles() %>%
addAwesomeMarkers(lng = coordinates[,1],
lat = coordinates[,2],
popup = aemet$df$name,
icon=icons)
map
As we can see, this method presents a different partition than K-means and PAM, if we take a look at the map, we can see that the seven clusters are based on geographical patterns but excepting the case of Tarifa which is located far from the rest of the pink curves; Granada which correspond to the blue group; and Izana which has been catalog as one group as its own. These three cases are understandable, considering that Izana presented a different behavior than the other Stations in the Canary Islands given that it does not reach as high temperatures as the rest. Moreover, Tarifa presents some higher temperatures than the rest of its closest Station because it is closely located to the sea; and, both stations of Granada present lower temperatures considering that both are located in higher zones resambling more to the Stations in the north. Additionally, this solution presents a clear separation of the west coast and east coast stations that it was not present before, also, there is a new red group that it is formed by stations that were part of the center blue group, consequently, we are facing a more detail partition than in the rest of the methods.
Now, we present the k-medoids which implements a variant of the k-means algorithm for functional data that makes use of functional depths for defining group centroids. Consequently, first we transform the temperatures in the fda format to the fda.usc format.
Recall the kmeans.fd will use by default the Fraiman and Muniz depth to compute the set of depths and then compute the sample functional median after trimming with \(\alpha=0.05\).
We will define \(G=5\) considering that it seamed to be a good partition to define the behavior of the stations based on zones where they are located. As we can see, there is a distintion of the curves with the highest temperatures, as well as a separation of the curved with the lowest temperatures and a division of the rest into three groups denoted by the level of temperatures thay they reach.
set.seed(1)
tt <- 1 : 365
temp.smooth <- eval.fd(tt,smooth.data.temp$fd)
fdataobj.temp <- fdata(t(temp.smooth),tt)
kmeans.temp <- kmeans.fd(fdataobj.temp,ncl=5,cluster.size=0,max.iter=500)
colors.kmeansfd <- c("green","orange","purple","blue","pink")[kmeans.temp$cluster]
par(mfrow=c(1,1))
plot(smooth.data.temp,lty=1,lwd=2,col=colors.kmeansfd,
main="Functional kmeans solution",xlab="Day",ylab="Temperature")
## [1] "done"
#MAP GGPLOT ICONS
icons <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = colors.kmeansfd
)
map <- leaflet() %>%
addTiles() %>%
addAwesomeMarkers(lng = coordinates[,1],
lat = coordinates[,2],
popup = aemet$df$name,
icon=icons)
map
Above, we present a map of the stations clustered using this method. As we can see, there are some similarities to past methods. We can see that the group in purple remains the same with the Stations of The Canary Islands excepting Tenerife/Los Rodeos and Izana. Moreover, we see that there is a partition among the Stations that are located on the coast next to the sea, as the north and east ones are part of the orange group and the southwest formed another group. Furthermore, we can see that there is a unique curve in the center that forms the pink group which, actually, corresponds to Nevacerrada, which was identified to be an outlier considering that presents the lowest temperatures of the set. Furthermore, we can see that there is a group in blue that contains a lot of curves located in the north and which present some stable pattern of changes throughout the year based on the seasons.
Additionally, we are presenting again the k-medoids with \(G=2\) to see the performance of the method. As we can see, the partition presented differs from the one given by the Agglomerative hierarchical clustering, where the separation showed a defined group of extreme behavior curves distinguish from the rest that maintains higher temperatures throughout the year. Now, this method presents a partition considering in one group the Stations with higher temperatures and another group with the ones with fewer temperatures, but without taking into account the variability over the seasons. In the curve plot, we can observe that the smoothed curves in green are just almost all of the ones located at the top, and in the orange group lay the ones with fewer temperatures located at the bottom.
Finally, when comparing to the past methods considering both cases, we can say that this partition considers the behavior of the curves, separating them in a better way based on the temperatures achieved, but it does not partition them well if we consider a more detailed behavior taking into account the variability throughout the year, as we could see with other methods like k-means, PAM and MClust algorithms that, actually, captured some characteristics of the groups and achieved a clear distinction of the zones. Consequently, by taking into account the average silhouette, we can say that k-means with five partitions, and by taking the BIC metric, the Mclust with 7 partitions could be the best methods to partition the Station records to define zones based on their temperatures considering the degrees that they achieved in a daily average as well as the variability presented throughout the seasons.
kmeans.temp <- kmeans.fd(fdataobj.temp,ncl=2,cluster.size=0,max.iter=500)
colors.kmeansfd <- c("green","orange")[kmeans.temp$cluster]
par(mfrow=c(1,1))
plot(smooth.data.temp,lty=1,lwd=2,col=colors.kmeansfd,
main="Functional kmeans solution",xlab="Day",ylab="Temperature")
## [1] "done"
#MAP GGPLOT ICONS
icons <- awesomeIcons(
icon = 'ios-close',
iconColor = 'black',
library = 'ion',
markerColor = colors.kmeansfd
)
library(leaflet)
map <- leaflet() %>%
addTiles() %>%
addAwesomeMarkers(lng = coordinates[,1],
lat = coordinates[,2],
popup = aemet$df$name,
icon=icons)
map